arXiv: 1501.0297Ivl [astro-ph.CO] 13 Jan 2015 


Prepared for submission to JCAP 


Reconstructing equation of state of 
dark energy with principal 
component analysis 

Hao-Feng Qin“ Xi-Bin Li® Hao-Yi Wan® and Tong-Jie Zhang, 

“Beijing Normal University, 

No.19, Xinjiekouwai Street, Haidian District, Beijing, P.R. China 
E-mail: tjzhang@bnu.edu.cn 

Abstract. We represent a method to reconstruct the equation of state for dark energy di¬ 
rectly from observational Hubble parameter data in a nonparametric way. We use principal 
component analysis (PCA) to extract the signal from data with noise. In addition, we mod¬ 
ify Akaike information criteria (AIC) to guarantee the quality of reconstruction and avoid 
over-htting simultaneously. The results show that our method is robust in reconstruction of 
dark energy equation of state. Although current observational Hubble parameter data alone 
can not give a strong constraint yet, future observations with more accurate data can help 
to improve the quality of reconstruction significantly, which is consistent with the results of 
H.-R. Yu et al. 
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1 Introduction 

Distance measurements of the type la supernovae (SNe la) indicate that the expansion of 
the universe is accelerating [1-3]. This implies that some mechanism must exist to provide a 
repulsive effect. Many theories have been proposed to explain this repulsive effect. The most 
popular one is the dark energy scenario. Dark energy is one kind of special matter which can 
provide the repulsive force that accelerates the expansion of the universe. However, its nature 
still remains unclear. To study the property of dark energy, Turner and White [4] suggest 
to parameterize the dark energy by its equation of state, w = P/p, where P is pressure and 
p is energy density. For different dark energy models, w takes different values (e.g., -1 for 
vacuum energy, —N/3 for topological defects of dimensionality N), and w also can also evolve 
with time (e.g., models with a rolling scalar field). The analysis from the Planck + WP and 
BICEP2 data shows that the equation of state w ^ —1 and the bounce inflation scenario is 
better than the standard cold dark matter model with a cosmological constant [5, 6]. Although 
observation is consistent with Qm < 1 and a cosmological constant A > 0 [7], the possibility of 
a time dependence of w or a coupling with cold dark matter cannot be excluded [8]. Current 
observational data (SNe + WMAP5 +SDSS) also favor a model predicting w(z) crosses —1 
in the range of z G [0.25,0.75] [9, 10]. In addition, on the theoretical level a constant A runs 
into serious problems, since the present value of A is 10^^^ times smaller than the prediction 
from most particle physics models. If A is not a constant, the dynamic properties of the dark 
energy may interest us [11, 12]. Since w can well describe the dynamic properties of dark 
energy, it is vital to determine w in the research of dark energy. 

To determine the equation of state for dark energy, there are many different observational 
data sets can be used. CMB anisotropy, supernovae (SNe) distance measurements and number 
counts all appear to be promising. One can ht directly to SNe magnitudes or their luminosity 
distances dL{z), or to more indirect quantities such as dark energy density, p{z), the expansion 
history, H{z) [13]. But even accurate measurements of (II cannot constrain the small bumps 
and wiggles that are crucial to the reconstruction of w. Without some amount of smoothing 
of the cosmological measurements, reconstruction is impractical [12]. As the combination of 
number counts and supernova measurements could determine H{z) directly and eliminate 
the dependence of the second derivative of dr,, using observational Hubble parameter data to 
constrain w seems to be a good choice. 
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Recently, there have been many works focus on dark energy equation of state reconstruc¬ 
tion. Parametric methods and non-parametric methods are two general methods for this issue. 
Studying dark energy in a parameterized way, i.e., parameterizing the dark energy equation 
of state in terms of known observables [8], e.g. Chevallier-Polarski-Linder parametrization 
[14, 15] and divergency-free parametrizations [16], may induce biased results due to prior 
assumptions of function forms of the equation of state [17]. It is wiser to reconstruct dark en¬ 
ergy equation of state with non-parametric ways, since we do not know the nature of the dark 
energy so far. Reconstruction methods can also be divided into model dependent methods 
and model independent methods. Model dependent methods work within a particular model, 
while model independent methods have no such constraint. In fact, any imposition would 
cause biased results [18], as a consequence of the strong degeneracy between cosmological 
models [19]. Therefore, a reconstruction of w{z) should be carried out ideally in a model 
independent manner [20]. 

In this paper, we apply principal component analysis (PCA), which is an useful non- 
parametric model-independent tool, on reconstruction of dark energy equation of state with 
observational Hubble parameter. However, in order to reconstruct w{z) we need to calculate 
the derivatives in H{z) which can, in general, increase the errors on the reconstructed equation 
of state factor [17, 21]. To overcome this defect, we try to fit w{z) directly to observational 
Hubble parameter data through the integral that relates w{z) to H[z). Another problem 
is that PCA can pick up the principal components in w(z), but how many components one 
should keep is still an open question. Too many parameters may cause overfitting and vise 
versa. So we attempt to modify Akaike information criteria (AIC) in order to guarantee the 
quality of reconstruction and avoid over-fitting. 

This paper is organized as follows. First, we describe the reconstruction process of dark 
energy equation of state in section 2. Next, we explain the generation of simulated data 
in section 3. After that, we introduce the information criteria for components selection in 
section 4. Then we show results in section 5. Finally, we discuss the implications of our 
results and draw the conclusions in section 6. 

2 Reconstructing process 

We reconstruct the equation of state for dark energy with the assumption that the universe is 
homogeneous, isotropic and governed by Einstein’s theory of gravitation. It is well known that 
the metric for a space-time with homogeneous and isotropic spatial sections is the maximally- 
symmetric Friedmann-Robertson-Walker (FRW) metric. We also treat the whole universe as 
ideal fluid. Then from Friedmann equation we have 



( 2 . 1 ) 


where fla, is dark energy density component, is matter density component, 14^ is curvature 
component, and Qr is radiation density component respectively. According to the latest result 
from Plank, we adopt Ho = (67.4 ± 1.4)kms~^Mpc~^,flx = 0.6825,14^ = 0.3175, f4fc = 
0, = 0 [22]. Then (2.1) can be reduced to 
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from which we can calculate the theoretical Hubble parameter Hth{z) given an expression 
for w{z). We can fit observational Hob{z) data to constrain the analytical form of w{z). 
Through least-square method, 


(2.3) 

i=l 

where N is the number of data points, Hob is observational or simulated Hubble parameter 
data, Hth is theoretical Hubble parameter, and a is the error of Hob respectively. However, 
as we have known, simply applying Least-square minimization can not distinguish between 
noise and signal, as noise is included in the result of optimization. In order to minimize the 
effect of noise, we use PC A to pick up the principal components of w{z). 

The process of applying PCA on reconstruction of equation of state is described as 
follows: Step 1. We choose a set of basis functions fi{z), which are orthotropic and complete, 
and equation of state w{z) can be expressed as 

N 

w{z) = ( 2 - 4 ) 

i=l 

where Oj are the coefficients of the corresponding basis functions. Step 2. We use (2.2) and 
Least-square optimization to derive the coefficients a*, so we can obtain a distribution of a 
with enough running of step 2. Step 3. We calculate the covariance matrix of a, denoted by 
C. Step 4. We calculate the eigenvalues Aj and eigenvectors by diagonalizing the covariance 
matrix C, 


C = EkE^, (2.5) 

where A is the diagonal matrix. The diagonal entries of A are eigenvalues, and the columns 
of matrix E are corresponding eigenvectors. Then we define a new basis U = EE, where 
E = (/i,/ 2 , ■■■In)- This new basis is also orthotropic and complete as a linear combination 
of the original basis functions /j, and Ui satisfies 

Var{ui) = Xi,i = 1,2...N. (2.6) 

The UiS with small variance are the relatively stable components, which vary little for 
different observations. They are hence the components we need. 

Step 5. We sort the components by A* in ascending order and choose first M components 
to reconstruct w{z). Besides, a criteria is needed to decide how many components should be 
used, which will be discussed in section 4. 

Step 6. We use new basis functions and Least-square Optimization to fit H{z) data 
again and obtain the new coefficients. Finally we get equation of state 

M 

w{z) = '^(3iUi{z), (2.7) 

1 = 1 

where /3j are the new coefficients derived by Least-square Optimization, and M is the number 
of used components. 
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Figure 1. Binned observational Hubble parameter data. Black dots are binned observational Hubble 
parameter, which extend to ^ = 2.3. The bars show the deviation of Hubble parameter data, which 
is used to build an error model for the generation of simulated Hubble parameter data. 

3 Generation of simulated data 

In order to validate the reconstruction process described in section 2, we use the simulated 
Hubble parameter data set, generated from some toy models (with known w{z)), to recon¬ 
struct the equation of state. Then we can evaluate the quality of reconstruction through 
comparing reconstructed equation of state with the known w{z). 

We generate simulated Hubble parameter data based on observational bubble parame¬ 
ter data set. Current observational bubble parameter data are obtained primarily from the 
method of cosmic chronometers [23-26]. Other methods to extract H{z) are by the observa¬ 
tions of BAO peaks [27, 28] and Ly-a forest of luminous red galaxies (LRGs) [29], which has 
extended the current OHD up to z = 2.3. Ref. [30] provides the data set we use, which covers 
several independent measurements of H[z). There are 29 data points in the data set in total 
{Hq included). As the redshift distribution of these data points is not (even not close to) 
an uniform distribution, we split the redshift region (0 < z < 2.3) into 15 bins. The binned 
observational Hubble parameter data set is shown in figure 1. 

To generate a simulated Hubble parameter data set, we need a fiducial model which 
can be characterized by an equation of state w{z), from which we can get the theoretical 
Hubble parameter Hth via (2.2) with a particular w{z). In addition to the underling model, 
we also need an error model, which estimates the deviations for simulated Hubble parameter 
data from the theoretical values. There have been some studies on how to obtain an error 
model from observational Hubble parameter data, for instance, in [17] Yu et al. suggested 
that the error of observational Hubble parameter data follow Nakagami m distribution; C.M. 
used the center line of upper heuristic bounds and lower heuristic bounds of the observational 
Hubble parameter data as the error model for simulated Hubble parameter data in [31]. If the 
uncertainty of the observational data is strongly dependent on redshift, we can fit the error of 
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Figure 2. Error model obtained from the observational Hubble parameter data. The curve is obtained 
through polynomial fitting to the error of binned observational Hubble parameter data. The error of 
binned observational Hubble parameter data is plotted as the black dots. 


the observational Hubble parameter data. The error model for simulated Hubble parameter 
data we obtained from the observational Hubble parameter data is shown in hgure 2. 

Finally, with the hducial model and the error model discussed above, we can generate 
the simulated Hubble parameter data via 

Hsim{z) = Hth{z) + G'(0, a{z)), (3.1) 

where (j{z) is the error function, G(/i, a) is the Gaussian distribution, /i is the mean of the 
distribution and a is the standard deviation of the distribution respectively. 

4 Criteria for components selection 

We have discussed the reconstruction process in section 2. However, the reconstruction is 
not complete. We need a criteria to determine how many components we should keep in the 
reconstruction. In this section, we introduce the criteria we used for components selection. 

The number of components should be kept is still an open question in principal com¬ 
ponents analysis. Many criteria have been proposed to solve this problem since PCA was 
invented in 1901, e.g. Akaike information criteria (AIC), Bayesian information criteria (BIG) 
[32], and the combinations of them. Yu et al. used a criterion called Goodness of Fit [17]. 
Although these criteria works well in general, there are defects in these criteria: introduc¬ 
ing either additional parameter or additional assumptions, such as s in the combination of 
AIG and BIG [33]. Furthermore, evaluating these parameters arbitrarily may lead to quite 
different reconstructing results in some particular problems. 

In this paper, we introduce a criteria based on AIG. As there are dozens of data points, 
we replace AIG with AIGc (the small-sample-size corrected version of AIC). The expression 
of AICc is 


- 5 - 
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^ICc — Xmin + _ TVf - 

where M is the number of parameters, i.e. the number of components we keep, and N is the 
size of data set, Xmin represents the deviation from observational data or simulated data. It is 
expected that reducing M will be accompanied by a reduction in the error, but an increased 
chance of getting w{z) wrong [33], which means w{z) is reconstructed less accurately (so the 
bias increases), but the error bars are smaller (so the variance decreases) [34] . It turns out that 
the challenge for this issue is to achieve a balance between the bias and variance. Therefore 
we add an empirical parameter s into the latter item of AICc to determine the value of s. 
After that, the modified AICc criteria is 

2N 

^ICc = Xmin + — M — 

where s is the tuning factor we add to AICc to include the penalty for more parameters and 
to avoid overfitting. 

The value of s is determined empirically and consists with the fact that results of best 
fit have the minimum AICc values. In practice, we use reconstruction results of different 
models to constrain s. As already known the fiducial model, we can choose proper number of 
components according to Max Likelihood Estimation. The joint distribution of data points 
follows chi-square distribution, so the likelihood function can be expressed as 

C = exp{-x^), 

1 

1=1 

where w is the reconstructed equation of state, and /i is the underling equation of state 
with error a. When we choose a set of components generated by PCA, we can calculate 
the associated x^- When x^ reaches the minimum, i.e. the likelihood function achieves the 
maximum, the corresponding set of components are what we need in the final reconstruction. 
Once the number of components is fixed, we can calculate the AICc to constrain s and further 
determine the value of s. For a given s, we can apply the AICc criteria on components 
selection. In other words, we should pick up proper number of components to make the AICc 
achieve the minimum. 

5 Results 

Following the reconstruction process described in section 2, in this section we reconstruct the 
equation of state from simulated Hubble parameter data generated from the fiducial model 
(with pre-set equation of state w{z)). The error of the simulated Hubble parameter data is 
20% of that of observational Hubble parameter data. There are three w{z) we reconstruct: 
a. ACDM model, w{z) = —1; b. model-1, w{z) = — tanh(i); c. model-2, w{z) = —1 -|- [1 — 
tan{j)] sinz^'^. Their equations of state are shown in figure 3. 

The results of reconstructions are shown in figure 4. It is evident that the reconstruction 
is reliable. Pre-set w{z)s are well in la regions of reconstructed w{z)s. Furthermore, the 
reconstructed w{z) well shows the variation trends of pre-set w{z)s. In terms of variance, 
we can see the variances of reconstructed w{z)s are under control at 0 < z < 2.0. But at 
z > 2.0, the variances dramatically increase. The right panel in figure 4 shows that the 
Hubble parameter (to be convenient, we draw H{z)/{1 + z) instead of H{z)) calculated from 
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Figure 3. Equation of state for three models. Solid line shows the equation of state of ACDM 
model, dash line shows the equation of state of model-1, and dash-dot line shows the equation of state 
of model-2. For model-1, w{z) is —1 at present and approches to 0 as redshift increases. For model-2, 
w(z) oscillates around —1 with an amplitude increases from 0 to 1. 


reconstructed w{z) is consistent with the Hubble parameter obtained from pre-set w{z)s. 
And the orange region is narrow, which implies that the variance of reconstructed Hubble 
parameter is very small. 

After reconstructing the equation of state for the fiducial models, we obtain the optimum 
value of s, using the method described in section 4. Then we apply the modified AICc criteria 
to reconstruct the equation of state from the Hubble parameter data with current error level. 
We follow the same process to reconstruct w{z) from simulated Hubble parameter data (with 
underling model-1). The error level of simulated Hubble parameter data is similar to that 
of the observational Hubble parameter data. The results are shown in figure 5. We can see 
that the reconstructed w{z) of simulated Hubble parameter data (with model-1) closely follow 
the underling w{z), as shown by Wmodel-i-, at 0 < z < 1.5. While the reconstructed w(z) 
of simulated Hubble parameter (with model-1) deviates from Wmodel-i rapidly at 2 > 1.5. 
And the variance of the reconstructed w{z) is large. We hence conservatively conclude that 
the reconstruction is robust at 0 < 2 : < 1.0. We also show w{z) reconstructed from the 
observational Hubble parameter data without model, as Wreco- Compared to the reconstructed 
w{z) of the simulated Hubble parameter data (with model-1), Wreco oscillates more wildly. The 
right panel in figure 5 shows the reconstructed Hubble parameter. It is clear that although the 
Hubble parameter calculated from the reconstructed w{z) of simulated Hubble parameter data 
(with model-1) is consistent with the theoretical Hubble parameter of model-1, its variance is 
very large. Figure 5 also shows the Hubble parameter calculated from Wreco, which is out of 
Icj region of Hubble parameter calculated from reconstructed w{z) of the simulated Hubble 
parameter data (with model-1) and deviates largely from real observational Hubble parameter 
data. 
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Figure 4. (color online). Left column: Reconstructed w(z) for different models. The light blue shade, 
green shade and orange shade shows Icr, 2a and Scr error respectively. Black dash line is pre-set w(z), 
and red solid line shows the reconstructed 111 ( 2 ). Right column: Hubble parameter id( 2 )/(I-|- 2 ). Red 
solid line is the Hubble parameter calculated from the reconstructed w(z). The orange shade shows 
its Icr error, and black dashed line shows the Hubble parameter calculated from pre-set w(z). 


6 Conclusion and discussion 

We apply principal component analysis to reconstruct the dark energy equation of state. We 
represent w(z) as linear combination of a set of orthogonal basis functions to avoid information 
loss. In addition, we modify Akaike information criteria (AIC) to balance between bias and 
deviation in the reconstruction. Then we reconstruct the equation of state for three models 
from simulated Hubble parameter data (the error is 20% of current observations’ error level) 
to validate our reconstruction process. The results show that we can constrain the equation 
of state quite well at 0 < 2 < 2.0 and the Hubble parameter calculated from reconstructed 
w(z} is consistent with the Hubble parameter calculated from pre-set w(z} very well as the la 














Figure 5. (color online). Left: Red solid line is the reconstructed w{z) of simulated Hubble param¬ 
eter data (with model-1). The light blue shade, green shade and orange shade shows Icr, 2(7 and Scr 
error respectively. Black dashed line is w{z) of underling model, and blue dash-dot line shows the 
reconstructed 'w{z) of the observational Hubble parameter data (with no underling model). Right: 
Red solid line is the Hubble parameter calculated from the reconstructed w(z) of simulated Hubble 
parameter data (with model-1), with orange shade of Icr error. Black dash line shows the Hubble 
parameter calculated from w(z) of underling model, and blue dash-dot line shows the Hubble pa¬ 
rameter calculated from the reconstructed w(z) of the observational Hubble parameter data (with no 
underling model). 


region is very narrow. However, differences of the Hubble parameter derived from different 
models are very small, although the models are quite different. This is also the reason 
why reconstructing dark energy equation of state from Hubble parameter data is difficult. 
Nevertheless, from this study we show that future observations with 20% error of current 
observational data will help to constrain dark energy equation of state to redshift z ~ 2.0. 
We confirm that future observations with larger quantity and better quality will greatly 
improve the reconstruction of dark energy [17] and a small sample collected at much higher 
redshift will reduce the errors more efficiently than collecting more events in the original 
redshift interval [35]. 

In the case of reconstructing dark energy equation of state from real observational Hubble 
parameter data, we can constrain w(z) at 0 < z < 1.0 with an underling model. But the 
reconstructed wlz) deviates largely from wlz) of underling model at z > 1.0, as the quality 
of data is very poor. If there is no underling model, the reconstructed w(z) oscillates even 
more wildly, and we can not know whether it is close to the true w(z}. 

Since we do not understand the nature of dark energy, it is possible that dark energy can 
lead to other observable effects such as a new long range force [36] . More accurate observations 
are needed to reveal the secrets of dark energy. 
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